Bayesian Spam Filtering Mechanism Based on Decision Tree of Attribute Set Dependence in the MapReduce Framework
نویسندگان
چکیده
Bayesian spam filtering is a classification method based on the theory of probability and statistics, and the Bayesian spam filtering based on Mapreduce can solve the defect of the traditional Bayesian spam filtering that consumes large amounts of system resources and network resources when the mail set is pre-training. It needs to classify mails manually in the pre-training phase of mail set, which consumes a lot of human and financial resources and affects the efficiency of the system. Bayesian spam filtering mechanism based on decision tree of the attribute sets dependence in the MapReduce framework which is presented in this paper. And the decision tree of attribute sets dependence is used in the training stage of the mail set, which improves execution efficiency of the system by lowering the time complexity.
منابع مشابه
Survey of Spam Filtering Techniques and Tools, and MapReduce with SVM
Abstract Spam is unsolicited, junk email with variety of shapes and forms. To filter spam, various techniques are used. Techniques like Naïve Bayesian Classifier, Support Vector Machine (SVM) etc. are often used. Also, a number of tools for spam filtering either paid or free are available. Amongst all techniques SVM is mostly used. SVM is computationally intensive and for training it can’t work...
متن کاملDetecting Image Spam Using Image Texture Features
Filtering image email spam is considered to be a challenging problem because spammers keep modifying the images being used in their campaigns by employing different obfuscation techniques. Therefore, preventing text recognition using Optical Character Recognition (OCR) tools and imposing additional challenges in filtering such type of spam. In this paper, we propose an image spam filtering tech...
متن کاملProvide a Predictive Model to Identify People with Diabetes Using the Decision Tree
Background: Today, in most hospitals in Iran, there is an extensive database of patient characteristics that includes a large amount of information related to medical, family and medical records. Finding a knowledge model of this information can help to predict the performance of the medical system and improve educational processes. Methods: Data mining techniques are analytical tools that are...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملPredicting Twist Condition by Bayesian Classification and Decision Tree Techniques
Railway infrastructures are among the most important national assets of countries. Most of the annual budget of infrastructure managers are spent on repairing, improving and maintaining railways. The best repair method should consider all economic and technical aspects of the problem. In recent years, data analysis of maintenance records has contributed significantly for minimizing the costs. B...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015